AI Engineering Architecture

The simplest architecture

In its simplest form:

an application receives a query
sends it to the model API
model generates a response
response is returned to the user

More components can be added

1. Enhance context

Placed between query and model API: Enhance context input into a model by giving the model access to external data sources and tools

via Retrieval Augmented Generation (RAG)
via tools that allow the model to automatically gather information through APIs such as web search

Quote

Context construction is like feature engineering for foundation models.

2. Put in guardrails

Placed at inputs and outputs: Guardrails help protect you and your users

input guardrails
- protect against two types of risks
  - leaking private information to external APIs
  - executing bad prompts that compromise your system
- how it works
  - e.g. sensitive data is detected by AI tools -> the entire query is blocked or the sensitive information is removed
output guardrails
- catch output failures: quality failure / security failure
- specify the policy to handle different failure modes

3. Add model router and gateway

Placed inside the model: they support complex pipelines and add more security

4. Reduce latency and cost with caches

Placed inside the model: typically implemented by model API providers

major system caching mechanisms:
- exact caching
- semantic caching

5. Add agent patterns

Placed in a loop: the system's output may not be enough to accomplish the task so it starts another cycle (similar to Intelligent Agent)